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In a multiple testing context, we consider a semiparametric mixture model with 
two components where one component is known and corresponds to the distribution of 
p-values under the null hypothesis and the other component / is nonparametric and 
stands for the distribution under the alternative hypothesis. Motivated by the issue of 
local false discovery rate estimation, we focus here on the estimation of the nonpara- 
metric unknown component / in the mixture, relying on a preliminary estimator of the 
unknown proportion 6 of true null hypotheses. We propose and study the asymptotic 
properties of two different estimators for this unknown component. The first estimator 
is a randomly weighted kernel estimator. We establish an upper bound for its point- 
wise quadratic risk, exhibiting the classical nonparametric rate of convergence over a 
class of Holder densities. To our knowledge, this is the first result establishing conver- 
gence as well as corresponding rate for the estimation of the unknown component in 
this nonparametric mixture. The second estimator is a maximum smoothed likelihood 
estimator. It is computed through an iterative algorithm, for which we establish a de- 
scent property. In addition, these estimators are used in a multiple testing procedure 
in order to estimate the local false discovery rate. Their respective performances are 
then compared on synthetic data. 

Key words and phrases: False discovery rate; kernel estimation; local false discovery rate; 
maximum smoothed likelihood; multiple testing; p-values; semiparametric mixture model. 



1 Introduction 

In the framework of multiple testing problems (microarray analysis, neuro-imaging, etc), a 
mixture model with two populations is considered 
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where 6 is the unknown proportion of true null hypotheses, <j> and / are the densities of 
the observations generated under the null and alternative hypotheses, respectively. More 
precisely, assume the test statistics are independent and identically distributed (iid) with a 
continuous distribution under the corresponding null hypotheses and we observe the p- values 
Xi,X%, . . . ,X n associated with n independent tested hypotheses, then the density function 
(j) is the uniform distribution on [0, 1] while the density function / is assumed unknown. 
The parameters of the model are (9,f), where 8 is a Euclidean parameter while / is an 
infinite-dimensional one and the model becomes 



Vze[0,l], g{x) = 6 + (1 - 9)f(x). 



(2) 



In the following, we focus on model ([2]). A central problem in the multiple testing setup 
is the control of type I (i.e. false positive) and type II (i.e. false negative) errors. The 
most popular criterion regarding type I errors is the false discovery rate (FDR), proposed by 



Beniamini and Hochbergl (jl995l ). To set up the notation, let Hi be the i-ih (null) hypothesis. 



The outcome of testing n hypotheses simultaneously can be summarized as indicated in 
Table □ 



Table 1: Possible outcomes from testing n hypotheses Hi, . . . , H, 





Accepts Hi 


Rejects Hi 


Total 


Hi is true 


TN 


FP 


n 


Hi is false 


FN 


TP 


»i 


Total 


N 


P 


n 



Beniamini and Hochbergl ()1995[ ) define FDR as the expected proportion of rejections 
that are incorrect, 



FDR = E 



FP 



max(P, 1) 



E 



FP 



P > 



P(P > 0). 



They provide a multiple tes ting procedure that guarantees the bound FDR < a, for a 
desired level a. IStorevI (120031 ) proposes to modify FDR so as to obtain a new criterion, the 
positive FDR (or pFDR), defined by 



pFDR = E 



FP , 

!rl p >o 



and argues that it is conceptually more sound than FDR. For microarray data for instance, 
there is a large value of the number of hypotheses n and the difference b etween pFDR an d 
FDR is generally small as the extra factor P(P > 0) is very close to 1 (see lLiao et al.l . l2004l ). 
In a mixture context, the pFDR is given by 



pFDR(x) = P(Hi being true \X < x) 



0§(x) + (1 - 9)F(x) 



where $ and F are the cumulative distribution functions (cdfs) for densities <p and /, 
respectively. (It is notationally convenient to consider events of the form X < x, but we 
could just as well consider tail areas to the right, two-tailed events, etc). 
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Efron et al.l (|200ll ) define the local false discovery rate (£FDR) to quantify the plausibility 
of a particular hypothesis being true, given its specific test statistic or p- value. In a mixture 
framework, the iFDR is the Bayes posterior probability 



£FDR(x) 



(Hi being true \X 



1 



x) + (l-0)f(x) 



(3) 



In many multiple testing frameworks, we need information at the individua l leve l about 
the probability for a given observation to be a false positive (jAubert et al.l . 12004 ). This 



motivates estimating the local false discovery rate £FDR. Moreover, the quantities pFDR 
and ^FDR are analytically related by pFDR(x) = E[£FDR(X)|X < x\. As a consequence 



(and r ecalling that the difference between pFDR and FDR is generally small), iRobin et al 
(|2007l ) propose to estimate FDR by 



, , i 1 

FDR(xj) = - ^2iFBR(xj), 



where ^FDR is an estimator of £FDR and the observations {x{} are increasingly ordered. A 
natural strategy to estimate £FDR is to start by estimating both the proportion 9 and either 
/ or g. Another motivation for e s timat ing the parameters in this mixture model comes from 
the works of Sun and Cai ( 2007 : 20091 ). who develop adaptive compound decision rules for 
false discovery rate control. These rules are based on the estimation of the parameters in 
model ([T]) (dealing with z-scores) rather than model (dealing with p- values). However, it 
appears that in some very specific cases (when the alternative is symmetric about the null), 
the oracle version of their procedure based on the p- values (and thus relying on estimators of 
the p arameters in model ([2])) may outperform the one based on model ([I]) (see Sun and Cai 
20071 . for more details). In the following, we are thus interested in estimating parameters in 
model ©. 



In a previous work (INguven and Matiasl . 120121 ) , we discussed the estimation of the Eu- 
clidean part of the parameter 9 in model (|2|). Thus, we will not consider further this point 
here. We rather focus on the estimation of the unknown density /, relying on a preliminary 
estimator of 9. We just mention that many estimators of 9 have been proposed in the lit - 
erature. One of the most well-known is the one proposed bv lStorev and Tibshiranil (|2003l ). 
motivating its use in our simulations. Note that some of these estimators are proved to be 
consistent ( under suitable mod e l assu mptions). This is for instance the case for the one 
proposed by ICelisse and Robinl (120101) . Moreo ver, this latter estimator has been shown to 



be -y/n-consistent in lNguven and Matias (|2012h . This will be used later when assessing the 



rate of convergence of one of our estimators of /. 

Now, different modeling assumptions on the marginal density / have been proposed in 
the literature. For instanc e, parametric models have been used w i th Beta distribution for the 



p-values (see for example I Allison et al.l . |2002| ; iLiao et al.l . |2004| ; iPounds and Morrisl. 12 003) 



or G aussian distribution of the probit transformation of the p- v alues (IMcLachlan et al. 



20061 ). In the framework of nonparametric estimation, IStrimmerl (120081 ) pro posed a modi- 



fied G renander density estimator for /, which has been initially suggested bv lLangaas et al. 



( 20051 ). This approach requires monotonicity constraints on the density /. Other non- 
parametric approaches consist in relying on regularity assumptions on /. This is done for 
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NeuviaJl f|20inh . who is primarily interested in estimating 9 under the assumption 



instance in 

that it is equal to g(l)- Relying on a kernel estimator of g, he derives nonparametr i c rate s 
of convergence for 9. Another kernel estimator has been proposed by iRobin et al.l (|2007[ ). 
along with a multiple testing procedure, called k erf dr. This iterative a lgorithm is inspired 
by an expectation-maximization (em) procedure (IDempster et all 119771 ) . It is proved to be 
convergent as the number of iterations increases. However, it does not optimize any criterion 
and contrarily to the original em algorithm, it does not increase the observed data likeli- 
hood function. Besides, the as ymptotic pro perties (with the number of hypotheses n) of 
the kernel estimator underlying IRobin et all s approach have not been studied. Indeed, its 
iterative form prevents from obtaining any theoretical result on its convergence properties. 

The first part of the present work focuses on the properties of a randomly weighted 
kernel estimator, w hich in essence, is very similar to the iterative approach proposed by 



Robin et al.l (120071 ). Thus, this part may be viewed as a theoretical validation of kerfdr 



approach that gives some insights about the convergence properties (as the sample size in- 
creases) of this method. In particular, we establish that relying on a preliminary estimator 
of 9 that converges at parametric rate, we obtain an estimator of the unknown density / that 
converges at the usual minimax nonparametric rate. To our knowledge, this is the first result 
establishing convergence as well as corresponding rate for the estimation of the unknown 
component in model (|2|). In a second part, we are interested in a new iterative algorithm 
for estimating the unkno wn density /, that aims at maxi mizing a smoothed likelihood. We 
refer to Paragraph 4.1 in lEggermont and LaRiccial (|200ll ) for an interesting presentation of 
kernel esti mators as max i mum smoothed likelihood ones. Here, we base our approach on 
the work of lLevine et all (1201 ll ). who study a maximum smoothed likelihood estimator for 



multivariate mixtures. The main idea consists i n introducing a nonlinear s moot hing opera- 
tor on the unknown component / as proposed in Eggermont and LaRiccial (jl995l ) . We prove 
that the resulting algorithm possesses a desirable descent property, just as an em algorithm 
does. We also show that it is competitive with respect to kerfdr algorithm when used to 
estimate ^FDR. 



The article is organized as follows. We start by recalling in Section [2] the construction of 
the kernel estimator underlying kerf dr's approach. Then in Section [31 we study a randomly 
weighted kernel estimator of the unknown density / and establish an upper bound of its 
pointwise quadratic risk. This estimator is similar in essence to the one underlying kerfdr 
algorithm and whose properties are out of reach because of its iterative form. In a second 
part (Section |4|, we introduce a new iterative algorithm for estimating / by maximizing a 
smoothed likelihood, and establish that it possesses a descent property. In Section [5j we 
rely on our different estimators to estimate the local false discovery rate and present some 
simulations to compare their performances. All the proofs have been postponed to Section [6] 
Moreover, some of the more technical proofs have been further postponed to Appendix [A] 



2 Motivating procedure: kerfdr algorithm 
2.1 Estimator's basis 

We start by explaining a natural construction of a kernel estimator of /. Since / is com- 
pletely unspecified, it has to be estimated in a nonparametric way, for example by a kernel 
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estimator. For any hypothesis, we introduce a (latent) random variable Zi that equals if 
the null hypothesis Hi is true and 1 otherwise, 

Vi = 1 n Z = { ° if Hi is tlU6 ' (A) 
\ 1 otherwise. 

Since / corresponds to the density of the observations distributed under the alternative hy- 
pothesis, a weighted kernel estimator seems to be suited. Intuitively, it would be convenient 
to introduce a weight for each observation Xi, meant to select this observation only if it 
comes from /. Equivalently, the weights are used to select the indexes i such that Zi = 1. 
Thus, a natural kernel estimate of / would be 

where K\ denotes a kernel function (namely a real-valued integrable function such that 
J K\(u)du = 1) and h > is the bandwidth. However, f\ is not an estimator and cannot 
be directly used since the random variables Z% are unknown. A natural approach is to 
replace them with their conditional expectation given the data {Xi}i<i< n , namely with the 
posterior probabilities TpQ) = E(Zj|Xj) defined by 

Vx e [0, 1], r(x) = E(Zi\Xi = x) = (1 ~fY (X) = 1 " "TV ( 5 ) 

g(x) g(x) 

This leads to the following definition 



h 



Once again, the weight n = t{Xi) depends on the unknown parameters 6 and / and thus 
/2 is not an estimator but rather an oracle. Thus it is natural to replace the posterior 
probabilities r, by estimators to obtain a randomly weighted kernel estimator of /. This is 
what is done, in an iterative way, by kerf dr algorithm below and further pursued in our 
approach developed in Section |3] 

2.2 kerfdr procedure 



Let us first recall the kerfdr algorithm proposed bv lRobin et al.l (120071 ) as an approximation 
to the estimator suggested by ([HJ). This algorithm constructs an iterative sequence {/^} 8 >o 
of estimates of density /. Let us be given a preliminary estimator 6 of proportion 9. We 
define a weighted sequence f and a weighted kernel estimator / which depend on each other, 
by 



f l= A 1 and m = lt^-K( X —^ 

e + (i-e)f( Xi ) h^^Uh v h 

where K is a kernel and /i>0a bandwidth. In the following, we let 
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Algorithm 1: kerfdr algorithm 
// Initialization; 

Set fP~W([0, l]),i = l,2,...,n. 

while maxj — f- s \/t^ s ^ > e do 
// Update estimation of /; 

// Update estimation of g\ 

gW(xi) = § + (l-6)fW(xi) 

// Update of weights; 

= {l-6)M(x l )/g^{x i ) 

s •<— s + 1; 
// Return; 

/ (s) (-) = E i ri s - 1) ^(-)/E fc ^- 1) 



Now, kerfdr algorithm is described belo w as Algorithm [T] 

This algorithm has some em flavor ( Dempster et alT 1977 ). Actually, updating the 



weights r- is equivalent to expectation-step, and f( s \x) can be seen as an average of 
{Ki(x)}i<i< n so that updating the estimator / may look like a maximization-step. How- 
ever, the algorithm does not optimize any given criterion. Besides, it does not increase the 
observed data likelihood function. 

The relation between and implies that the sequence {t( s )} s >o satisfies f( s ) = 
i/>(f( s_1 )), where 

ff, : [0, lf\{0} [0, If, Mu)= ^ jjffi^ - , with bij = ^ x Kl[X]) 



T hus, if the se q uence {f( s )} s >o is convergent, it has to converge towards a fixed point 



of ip. iRobin et al.l (|2007l ) prove that under some mild conditions, the estimator described 
in Algorithm [1] is self-convergent, meaning that as the number of iterations s increases, the 
sequence /( s ) converges towards the function 

where f* is the (unique) limit of {f\ ■ }s>0- Note that contrarily to fa, function is a 
randomly weighted kernel estimator of /. However, nothing is known about the convergence 
of nor towards the true density / when n tends to infinity (while the bandwidth 
h = h n tends to 0). Indeed, the weights {f^} s >o used by the kernel estimator form 
an iterative sequence. Thus it is very difficult to study the convergence properties of this 
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weight sequence or of the corresponding estimator. In the following Section [3l we propose 
another randomly weighted kernel estimator, whose weights are slightly different from those 
used in the construction of f s . More precisely, those weights are not defined iteratively but 
they mimic the sequence of weights {t^ } s >q. 



3 Weighted kernel estimation of the unknown density / 

As already said, Equation (|6|) suggests to replace the posterior probabilities T{ by estimators 
to obtain a randomly weighted kernel estimator of /. Specifically, let 9 be a given estimator 
of the proportion 9 and g n be a nonparametric estimator of the density g. We propose here 
to rely on a (direct) kernel estimator of the density g 

u^iE^m, (7) 



nh ' V h 

i=l 

where is a kernel to be chosen later. We then propose an estimator of the posterior 
probability t(x) defined by 

VxG[0,l], f( x ) = l-JL-. (8) 

By defining the weight 

f - ~ f(Xt) ~ 1 " ms ■ where UXl) ~ l^Th £ K > (ttt 1 ) ■ < 9 > 

we get a randomly weighted kernel estimator of the density / 

Vx6[0,l], f n ( x ) = l£*L- K JZ^i\ (10) 

Note that it is not necessary to use the same kernel in defining g n and f n . However in 
practise, the choice of the kernel has a negligible influence on the performances of a kernel 
estimator. We provide below the properties of convergence of the estimator f n . In fact, these 
naturally depend on the properties of the plug-in estimators 9 and g n . We are interested 
here in controlling the pointwise quadratic risk of estimator f n . This is possible on a class of 
densities / that are regular enough. In the following, we let ¥ g and E 5 respectively denote 
probability and expectation of iid random variables with density g. In the same way, we 
denote by ¥gj and Kg f the probability and corresponding expectation in the more specific 
model (PJ). Moreover, [x\ denotes the largest integer strictly small er than x. Now, we recall 
that the order of a kernel is defined as its first nonzero moment ( Tsvbakov . 20091 ) and we 
recall below the definition of Holder classes of functions. 

Definition 1. Fix (3 > 0,L > and denote by H(f3,L) the set of functions ip : [0, 1] — > M 
that are l-times continuously differentiable on [0, 1] with I = \_(3\ and satisfy 

| ip®(x) - tp {l \y) \< L | x-yf~\ Vx, y £ [0,1]. 



The set H(f3,L) is called the (/3,L) -Holder class of functions. 
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We denote by S(/3,L) the set 

S(/3,L) = |y> : ij> > 0, / ^{x)dx = 1 and z/> G L)). 
According to the proof of Theorem 1.1 in iTsvbakovl (|2009h . we remark that 



SUp HVIloo < +00- 
^£S(/3,L) 

In order to obtain the rate of convergence of the kernel density estimator f n to /, we 
introduce the following assumptions: 

(Al) The kernel K is a right-continuous function. 

(A2) K is of bounded variation. 

(A3) The kernel K is of order I = |_/?J and satisfies 

J K{u)du = 1, j K 2 {u)du < oo, and J \u\^ \K (u)\du < oo. 

(Bl) / is a uniformly continuous density function. 



(CI) The bandwidth h is of order an" 1 /' 2 ^ 1 ), a > 0. 



Note that there exis t kern els satisfying Assumptions (A1)|(A3) (see for instance Sec- 
tion 1.2.2 in Tsvbakov , 20091 ). Note also that if / G S(/3,L), it automatically satisfies 



Assumption (Bl) 



Remark 1. i) We first remark that if kernel K2 satisfies Assumptions \(AT)\ \(A2)\ and 
if Assumptions (Bl) and (CI) hold, then th e kernel density estimator q n defined 
by ([7J converges uniformly almost surely to g (Wied and WeiftbacA , \20lj) . In other 
words 



\9n ~ 9\ 



-> 0. 



ii) If kernel K2 satisfies Assumption \(A3)\ and if Assumption | ( Cl)\ holds, then for all 
n>l 

-2/3 



sup sup E ej (\g n (x) -g(x)\ 2 ) < Cnw, 

xG[0,l]/eS(/3,L) 



where C = C(/3, L, a, K2) (see Theorem 1.1 in Tsubakoi , 20 0&) . 



In the following theorem, we give the rate of convergence to zero of the pointwise 
quadratic risk of the estimator f n defined by ([T0|) . 
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Theorem 1. Assume that kernel K\ satisfies Assumption ] ( A 3)\ K\ G L^R) and kernel K 2 
satisfies Assumptions (Al) (A3) If 9 converges almost surely towards 9 and the bandwidth 
h = qr -1 /^ 1 ) with a > 0, then for any 5 > 0, the pointwise quadratic risk of f n satisfies 



sup sup sup Eg t f(\f n (x) - f(x)Y 
ze[o,i] 0e[«5,i-<5] /eS(^,L) 



< C\ sup sup 

6e[S,l-8] f£E(fi,L) 

+C 2 n 2/5+1, 



where Ci,C 2 are two positive constants depending only on /3,L,a,S and K\, K 2 . 

The proof of this theorem is postponed to Section \6. II It works as follows: we first start 
by proving that the pointwise quadratic risk of f 2 (which is not an estimator) is of order 
n -2/3/(2/3+i) Then we compare the estimator f n with the function f 2 to conclude the proof. 
In Section 16.21 we prove the following corollary. 

Corollary 1. Under th e assumptions of Theorem [1\ using the estimator 6 proposed in 
Celisse and Robin \201A) . we obtain that for any fixed value (6,f), there is some positive 



constant C such that 



-2,8 



sup Eo, f (\f n (x) - f{x)\ 2 ) < CnW+i. 

x6[0,l] 

Note that the rate n~ ^/( 2 ^+ 1 ) is the usual nonparametric minimax rate over the class 
£(/?, L) of Holder densities. However, the corollary states nothing about uniform conver- 
gence of f n (x) with respect to the parameter value (0,f) since the convergence of the 
estimator 9 is not known to be uniform. 



4 Maximum smoothed likelihood estimator of the unknown 
density / 



In this section, following the lines of lLevine et al.l (120111 ). we construct an iterative estimator 



sequence of the density / that relies on the maximisation of a smoothed likelihood. Assume 
that the kernel K is positive and symmetric on R. We define its rescaled version Kh(x) = 
h~ 1 K(h~ 1 x). We consider a linear smoothing operator S : Li([0, 1]) — > Li([0, 1]) defined as 

5/(l)= /'y» -*>/w rf „. 

Jo J Kh(s — u)ds 

We remark that if / is a density on [0, 1] then Sf is also a density on [0,1]. Let us consider 
a submodel of model (|2|) restricted to 

T = {densities / on [0, 1] such that log / G Li([0, 1])}. 

We denote by S* : Li([0, 1]) -)• Li([0, 1]) the operator 



K h (u - x)f(u)du 
Jo K h (s - x)ds 
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Note the difference between S and S*. The operator S* is in fact the adjoint operator of S. 
Then for a density / 6 T , we approach it by a nonlinear smoothing operator M defined as 

Aff{x) = exp{(S*(log f))(x)}, x E [0, 1]. 

Note that N f is not necessarily a density. Now, instead of the classical log-likelihood, we 
consider (the opposite of) a smoothed version of it as our criterion, namely 



-1 n 

U0, f) = — V log[0 + (1 - 9)Nf(X i )}. 



n 

i=l 



In this section, we denote by go the true density of the observations X{. For any fixed value 
of 9, up to the additive constant Jq go(x) log go(x)dx, the smoothed log-likelihood l n (9,f) 
converges almost surely towards 1(9, f) defined as 

1{6J) := l 9o{x)l ° g 9 + (l 9 -f)Mf(x) dx - 

This quantity may be viewed as a penalized Kullback-Leibler divergence between the true 
density go and its smoothed approximation for parameters (9, /). Indeed, let D(a \ b) denote 
the Kullback-Leibler divergence between (positive) measures a and b, defined as 

D(a\b)= [ \ a(x) log + b(x) - a(x)\dx. 
Jo 1 h ( x ) J 

Note that in the above definition, a and b are not necessarily probability measures. Moreover 
i t can be seen tha t we still have the property D(a\b) > with equality if and only if a = b 
(Eggermont, 19991 ) . We now obtain 



1(9, f) = D( 90 \9 + (1 - 9)Mf) + (1 - 0)(1 - ! Aff(x)dx). 

Jo 

The second t erm in the right-hand sid e of the above equation acts as a penalization term 



(jEggermontl . Il999i : iLevine et all 120111 ) . Our goal is to construct an iterative sequence of 
estimators of / that possesses a descent property with respect to the criterion 1(9,-) (for 
fixed value 9). We start by describing such a procedure, relying on the knowledge of the 
parameters (thus an oracle procedure) in the next section. Then in Section 14. 2| we derive 
the procedure without this knowledge, obtaining an iterative sequence of estimators. 



4.1 An oracle iterative procedure to approximate / 

In this section, we fix the value of 9 or consider 9 as a given estimator. Let us denote by 
l n (f) the smoothed log-likelihood l n (9,f) and by 1(f) the limit function 1(9, f). Our goal 
is to construct an iterative algorithm which ensures that the value of l(-) decreases at each 
iteration. Specifically, we construct a sequence of densities {/*}t>o such that 

1(f) ~ Kf t+1 ) > cD(f t+1 | /*) > 0, 



Alternative density estimation in multiple testing 



11 



where c is a positive constant depending on 8, the bandwidth h and the kernel K. Let us 
first introduce a weight function 

V/^,V, 6 [0,1], = 

and an operator G : J- — > Li([0, 1]) 

n(t\i \ i 1 K h(u- x)ujf(u)g (u) 

G(f)(x) = a f / -j— — — du, x 6 [0, 1], 

jo L Kh (s — u)ds 



where a, 1 = Uf(u)go(u)du. 



Remark 2. VKe remark that G(f) is automatically a density on [0,1]. Moreover, for every 
density f £ J 7 , we /mue = S</2 where 

u f (x)g (x) 

= 71 7777- sG[0,l]. 

Jo Uf{u)g (u)du 

We iteratively define a sequence of densities {/*}t>o in J 7 as 

/<«(*) = <?(/<)(*) = „, /■' ^^M^M d „, I6 10|1] , 

jo J iv^s — ujas 

where 

1 , f s (1 - g)AA/ f (x) 

«t = - i ana OJAx) = ; ^rmrn ■ 

tiu t (u)g (u)du K> 9 + (l-e)Aff(x) 

Note that the sequence {/ 4 }t>o does not define an estimator sequence as it depends on the 
true density go which is unknown. This is why we call it an oracle. We shall define an 
estimating sequence {/*}t>o as a second step in Section f4.2l Let us now denote by 

m = inf Kh(x) and M = sup Kh(x), 

xe[-i,i] 

then m and M are two positive constants depending on the bandwidth h and the kernel K. 
We note that for all x G [0, 1], 



m < Kh(u — x)du < min(Af, 1). 

JO 



We also introduce 

B = {Sip; (p density on [0, 1]}. 

According to Remark [2j every function /* belongs to B. For all / G B, we remark that 
m < /(•) < M/m. As a consequence, we obtain that the sequence {/(/ )}t>o is lower 
bounded (by j go log go — m )- We now state the descent property on the sequence {/*}t>o- 
Its proof may be found in Section 16.21 
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Proposition 1. Any iterative sequence of densities {/*}t>o defined as f t+1 = G(f t ), with 
f° € T satisfies the descent 'property 

1(f) ~ l(f t+1 ) > cD(f t+1 | /*) > 0, 

where c is a positive constant depending on 0, the bandwidth h and the kernel K . 

Note that the sequence {/(/*)}t>o i s decreasing and lower bounded, thus it is convergent. 
Moreover, each sequence {/*}f>o converges (simply) to a local minimum of I. 

Remark 3. If f* is a global minimum of I then f* is a fixed point of G. Indeed, we have 

> /(/*) _ l(G{f*)) > cD(G(f) I fl > 0. 

It entails that D(G(f*) | /*) = and thus f* = G(f*). Note that the converse is not true: 
a fixed point of G is not necessarily a minimum of the criterion I. 

Now, by using convexity arguments, we can further prove the uniqueness of a minimum 
of I and thus the (simple) convergence of the sequence {f t }t>o to this minimum. 

Corollary 2. The criterion I has a unique minimum f* on B. Moreover, any iterative 
sequence of densities {/*}t>o defined as f t+1 = G(f t ), with f° G T converges simply to f*. 

We conclude the study of the convergence properties of the sequence {f t }t>o by further 
establishing its uniform convergence below. The proof of this proposition and of the former 
corollary may be found in Section 16.21 



Proposition 2. If there exists a constant L depending on h such that for all x, y £ [—1, 1] 

\K h (x) - K h (y)\ <L\x-y\, 
then the sequence of densities {/'}t>o converges uniformly to f*. 

Note that the previous assumption may be satisfied by many different kernels. For 
instance, if K is the density of the standard normal distribution, then this assumption is 
satisfied with 

L = -^e- 1 / 2 . 



4.2 Estimation procedure 

As previously said, the sequence {/*}t>o is an oracle as it depends on the knowledge of the 
true density go that is unknown. We now want to construct an estimating sequence {/*}t>o 
of the sequence of functions {f t }t>o- Relying on a Monte-Carlo method to approximate the 
integral involved in the definition (jlip of / , we propose an iterative algorithm to estimate 
the density /: given an initial value 6jq = (a>o(l)> • • • i&o( n )) £ (0, l) n of the weights, iterate 
the following steps for t = 0, 1, 2, . . . 

£t+l ( \ 1 K h (x - Xj)u>t(Xi) 

1 ( ) JXi^tWU £ K h (s - Xi )ds 
Cj t {Xi) K h (x-Xi) 



E 

i 



?ELi £ K h (s - Xi)ds 
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and 

{l-6)Nf t+1 (X l 



e + {\-e)Mf t +\x i ) 



The procedure is summarized in Algorithm [2] Its properties are described in Proposi- 
tion [3] whose proof is very similar to the proof of Proposition [1] and therefore omitted. 

Algorithm 2: Smoothed-likelihood kernel algorithm 
// Initialization; 

Set 6j = (wo(l),...,wo(n)) G [0,1]™. 

while max, |cDf(i) — Lbt-i{i)\ / &t-i{i) > e do 
// Update estimation of /; 

I / Update of weights; 

UJt ^> e+ii-ewPiXi) 
ti-t + 1; 



// Return; 

it i \ _ sr^n 

1 U ~ Y.'UZt-m'^K^s-X^ds 



Proposition 3. For any initial value of the weights £b$ S (0, l) n , the sequence of estimators 
{f}t>o satisfies 

U/VU/* +1 )>ciX/' + M/*)>o, 

where c is a positive constant depending on 0, the bandwidth h and the kernel K. 

As a consequence and since l n is lower bounded, the sequence {f l }t>o converges to a 
local minimum of l n as t increases. Moreover, we recall that as the sample size n increases, 
the criterion l n converges (up to a constant) to /. Thus, the outcome of Algorithm [2] is an 
approximation of the minimizer /* of I. 



5 Estimation of local false discovery rate and simulation study 
5.1 Estimation of local false discovery rate 

In this section, we study the estimation of local false discovery rate (^FDR) by using the 
previously introduced estimators of the density / and compare these different approaches 
on simulated data. Let us recall the definition (|3]) of the local false discovery rate 

^FDR(x) = F(Hi being true \X = x) = x £ [0, 1]. 
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For a given estimator 9 of the proportion 9 and an estimator / of the density /, we obtain 
a natural estimator of the local false discovery rate for observation Xi 

a 

^FDR(xj) = — . (12) 

9 + (l-9)f( Xl ) 

Let us now denote by / w k the weighted kernel estimator of / constructed in Section 12,11 
by /kerfdr the estimator of / presented in Algorithm [1] and by / ms i the maximum smoothed 
likelihood estimator of / presented in Algorithm [2] Note that /kerfdr is available through the 
R package kerfdr. We also let ^FDR, m ,m E {wk, kerfdr, msl} be the estimators of £FDR 
induced by a plug-in of the estimators f m in (j 1 2 1) . We compute the root mean squared error 
(RMSE) between the estimates and the true values 



RMSE m = l^\ 

8=1 \ 



1 n 

- ]r{^FDR m (xi) - ^FDR(xj)} 2 , 



n 
i=i 



for m £ {wk, kerfdr, msl} and where s = 1, . . . ,S denotes the simulation index (S being the 
total number of repeats). The quality of the estimates provided by method m is measured 
by the mean RMSE m : the smaller the RMSE m , the better the performance of the method. 

5.2 Simulation study 

In this section, we give an illustration of the previous results on some simulated experi- 
ments. We simulate sets of p- values according to the mixture model (|2]). We consider four 
different values for the proportion (0 = 0.7, 0.8, 0.9 and 0.95) and three different cases for 
the alternative distribution /. In the first case, we simulate p-values under the alternative 
with distribution 

f(x) = p(l-x) 1[0,1]( X )' 
where p = 4, as proposed in ICelisse and Robinl (|2010h . In the second case, the p-value 



corresponds to the statistic T which has a mixture distribution 0A/"(O, 1) + (1 — 9)J\f(p, 1), 
with p = 2. In the third case, the p-value corresponds to the statistic T which has a mixture 
density 0(1/2) exp{-|t|} + (1 -0)(l/2) exp{-\t-p\}, with p = 1. For each of the 4x3 = 12 
configurations, we generate S = 100 samples of size n = 1000. In these experiments, w e 



choose to consider the estimator of 9 initially proposed by ISchweder and SpiOtvollI (|1982[ ) 
namely 

~ #{Xj > X;i = l,...,n} 
n(l - A) 



with parameter value A = 0.5, as recommended by IStorev and Tibshiranil (120031 ). Figure [T] 
shows the RMSEs for the twelve configurations and the three different methods. 

First, note that the three methods exhibit small RMSEs and are thus efficient for es- 
timating ^FDR. We note that the first method (weighted kernel, wk) tends to have lower 
performances than the other two methods. Remember that we introduced it only as a 
way of approaching the theoretical performances of kerfdr method. Now, the maximum 
smoothed likelihood (msl) method tends to have identical performances than kerfdr, with 
some cases where it outperforms kerfdr. Thus it appears as a competitive method for 
£FDR estimation. 
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Figure 1: Root mean square error (RMSE) between the true local false discovery rate and 
the estimates as a function of the proportion 6. Methods: "□" = wk, "A" = kerfdr, "•" 
= msl. Top left: first model, top right: second model, bottom left: third model. 
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6 Proofs 



6.1 Proof of Theorem Q] 

The proof works as follows: we first start by proving that the pointwise quadratic risk of 
function fi defined by ([6]) is order of n _2 ^'( 2 ^ +1 ) j n the following proposition. Then we 
compare the estimator f n with the function f 2 to conclude the proof. 

We shall need the followin g two lemmas. T he proof of the first one may be found for 
instance in Proposition 1.2 in iTsvbakov (l2009h . The second one is known as Bochner's 
lemma and is a classical result in kernel density estimation. Therefore its proof is omitted. 



Lemma 1. (Proposition 1.2 in Tsvbakoi (200& )). Let p be a density in S(/3,L) and 
K a kernel function of order I = \_/3\ such that 



\u\P\K{u)\du < oo. 



Then there exists a positive constant C3 depending only on (3, L and K such that for all 
x e ~ 

K{u) [p(x + uh) - p(x )] du < C^h 13 , V/i > 0. 



/ 



Lemma 2. (Bochner's lemma). Let g be a bounded function on 
neighborhood of xq £ M. and Q a function which satisfies 



continuous in a 



f 



\Q(x)\dx < 00. 



Then, we have 



1 f fx — Xq n 

hm- / Q[ — - — )g(x)dx = g(x ) \ Q(x)dx. 



Now, we come to the first step in the proof. 



Proposition 4. Assume that the kernel K\ satisfies Assumption (A3) and the bandwidth 



h = an 1 /( 2 / 3 + 1 ) ) with a > 0. Then the pointwise quadratic risk of function $2, defined 
by (|6|) and depending on (6,f), satisfies 

sup sup sup E ej (\f 2 (x)- f(x)\ 2 ) <C 4 n»f>+ 1 , 
xe[o,i] 0e[<5,i-<5] /es(/3,i) 

where C4 is a positive constant depending only on (3, L, a, 5 and K\. 

Proof of Proposition [7} Let us denote by 

f(Xi) 



The pointwise quadratic risk of f 2 can be written as the sum of a bias term and a variance 
term 

E*,/(|/2(a:) - f(x)\ 2 ) = \EejMx)) - f{x)f + Vax fli/ [/ 2 (x)]. 
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Let us first study the bias term. According to ([6j and the definition (j5j) of the weights, we 
have 



E*,/[/2(*) 



?E 



i 



.(^)(I> 



/? 



fe=i 



1 \ c-l 



5 ' /(t » 



s(*i)~"*^ 

X — t 







/? 



E, 



0,/ 



x/h 



K X {t)f{x + th)Egj 



<?(:r + £fo) 



dt 

+ Sn-l 



dt. (13) 



Since the functions / and g are related by the equation g{t) = 9 + (1 — 9)f{t) for all i £ [0, 1], 
the ratio f(t)/g(t) is well defined and satisfies 

< 44 < — "~ ^ < (T 1 , Vt G [0, 1], and V0 £ [5, 1 - 51. 
~ g{t) ~ 1-9 ~ ' 1 ' J L ' J 

Then for all t £ [— (1 — x)/H\, we get 



1 



5 n _i + 5 



< 



— — — — + b n -i I 
<7(x + th) 



Sn-i 



where the bounds are uniform with respect to t. 
By combining this inequality with (|13p . we obtain 



(l-x)/h 



-x/h 

and E e>f [f 2 (x)] < 



Ki{t)f{x + th)dt E ej 



S n -l + S- 1 



n 



(l-x)/h 
x/h 



K 1 (t)f(x + th)dt E 6 



<®9,f[M* 
1 



S n -1 



Then, we apply the following lemma, whose proof is postponed to Appendix IA.1I 

Lemma 3. There exist some positive constants ci, 02,03,04 (depending on 5) such that for 
n large enough, 



1 



1 , ci 



Efl,/( — I < - + ^7 



E fl 



E 



0,/ 



1 



5 n + 2<J- 



and Efl 



Relying on Inequalities (|14p and (|16p . we have for n large enough 



> 1_C3 

— 9 ' 

n ra z 
1 



(14) 
(15) 
(16) 
(17) 



■■{l-x)/h 
-x/h 



K^fix + t^dt - ^ < E ej [f 2 (x)] < 



(l-x)/h 



Cl 



K 1 (t)f(x + th)dt + 

x//i n 
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Since f(x + th) = for all t ^ [— (1 — x)/h], we may write 
*(l-x)/h 



x/h 



Ki(t)f(x + th)dt= / Kx(t)f(x + th)dt. 



Thus, the bias of f2(x) satisfies 

\b(x)\ = \E ej [f 2 (x)]-f(x)\< [ K 1 (t)\f(x + th)-f(x)\dt + ^. 

JR n 

By using Lemma [1] and the choice of bandwidth h, we obtain that 

b 2 (x) < C 5 h 2 P, 

where C5 = Cs(/3, L, K\). Let us study now the variance term of f2{x). We have 



Var ej [f 2 {xj\ = [nYaxejiYx) + n{n - l)Cav ej/ (Yi, Y 2 )] . 



(18) 



where 



Y 



s,, 



g(Xi)"\ h 

The variance of Y\ is bounded by its second moment and 



E fi 



x-X x 



s. 



It 

1 t 



g(t) 



Ki 



x — t 



(fit) „ 



dt. 



Now, recalling that < f /g < 5 1 and using Inequality fj 1 5 [) of Lemma [3] we get 



E ej (lf) < h 



(l-x)/h r2 



x/h 



f 2 (x + th) 
g{x + th) 



Kf(t)dt)E 



n-gs 



n-l 



< hS 1 sup 
C e h 



jKl{t)dt) C f 2 



< 



n- 



(19) 



We now study the covariance of Y\ and Y 2 

Covej{Y u Y 2 ) = Eqj(YiY 2 ) - E^(Yi) 



E 



e,f 



[o,i] 2 



g(X 1 )g(X 2 )" ± \ 
x — t 



2) k 1 ( x ~ Xi )k 1 ( x ~ X2 



h 



x — u 



S ,, 



E 



h 



s„ 



\g(t) + g(u) 



>n-2 



dtdu 



f{t)Kj 



x — t 



E h 



(fit) „ 



-1 



dt 



'[0,1]= 



x — t 



A". 



x — u 



A(t, u)dtdu, 
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where 
A(t,u) 



< E fi 



(f{t) f{u) 

\W) + W) +Sn - 2 



E fi 



n-l 



Eft 



7 T i «Jn— 1 



n-2 



Hence 

Cov(y l5 y 2 ) < 



[oip/(t)/(a)A ,(^) Kl (^ 

2 



E« 



/ V25" 1 +5, 



f S n _ 2 ) 



altalu 



< h 2 (^j f(x + th)Ki(t)dtj 



n-2 



E^ 



1 



< C 7 h 2 



Eft 



n-2 



Efti 



1 



/V 9A-1 



+ 5 n _ 2 



According to Inequality (|17p of Lemma [3J we have 



Eft 



n-2 



E 



1 



hence 



'V 2*5-1 + S n _ 2 
C 8 /i 2 



< 



C4 



71" 



Cov fi>/ (yi,y 2 ) < 



(20) 



By returning to Equality fj 1 8 [) and combining with (jl 9[) and (|20p . we obtain 



Var,,, [/„(*)] <^ 



?? 



+ n(n — l)/i — 5— 



< 



nh 



Thus, as the bandwidth h is of order n 1 /( 2 / 3 + 1 ) ) the pointwise quadratic risk of f 2 (x) satisfies 



E e , f (\f 2 (x)-f(x)\^<C 4 n^. 



□ 



Proof of Theorem^ First, the pointwise quadratic risk of f n {x) is bounded in the following 

way 

®eA\fn(x) ~ f(x)\ 2 ) < 2E ej (\f 2 (x) - f(x)\ 2 ) + 2E ej (\f n (x) - f 2 {x)\ 2 ). (21) 



According to Proposition [4j we have 



%(i/ 2 w-/wr)<c 4 nw 



(22) 
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and it remains to study the second term appearing in the right-hand side of (|2ip . We write 



fnix) - f 2 (x) 



i=1 V Hk Ylk T k 



x - Xi 



1 v-^ Tj — r,- , fx — Xj \ 1 



-T 



x - Xj 



n 



Ek f k nh 



n — 1 * ' 



+ 



n 



Y,k( T k-h) „ i 



n 



nh 



i=l 



x - Xi 



Moreover, recalling the definition of the weights @, we have for all 1 < i < n, 



n - n 



1 



1 



g n (Xi) g(Xi) lg n (Xi) g(Xi)l g(Xi) 



+ 



1 



-(9-9), 



and thus get 
hi?) ~ f2(x) 



n0 



1 n i 

-Y\— 

nh f-' LcLL 



Efc^fc nh^lg n (Xi) c;(X) 



n(0 - 9) 1 



#1 



x — Xi 



— V — — Ki 

T,k f k " n/i ^ #(X) 1 

n 4-^ La„ 



x - X: 



+ 



Efc f fcEfc r fc n^Vg n (X k ) g(X k )l nh ^ 
?i 2 (#-#) 1 



1 1 



Efc^Efc r fc n^g(X k ) nh^ 1 \ h 



x - Xi 



(23) 



Let us control the different terms appearing in this latter equality. We first remark that for 
all i, 



< t, < 1 and 



1 1 



< < <r x . 



(24) 



g(Xi) ~ 9 

Since by assumption 6 — ^— > 6 6 [0,1], for n large enough we also get \6\ < 3/2, a.s. 

71— >00 

According to the law of large numbers and E# t {t\) = 1 — 6, we also obtain that for n large 
enough 



s_ < i_^e < 

2 ~ 2 ~ n 



1 n 



TV < 



3(1 



3(1 - J) 
< a.s. 



(25) 



i=l 



Moreover, by using a Taylor expansion of the function u^l/ti with an integral form of the 
remainder term, we have for all i, 



1 



1 



g n {Xi) g(Xi) 



\g n (Xi) - g(Xi)\ f 1 



<? 2 (X) 



f 1 f g n (Xj) - g(Xi) Y* j 

I [ 1+s m ) ds - 
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Since convergence of g n to g is valid pointwise and in norm (see Remark [T]) , and since g n 
is a slight modification of g n , we have almost surely, for n large enough and for all s G [0, 1] 
and all x G [0, 1], 

q n (x) — q(x) „ ||g n — olloo „ s 
c/(x) 6» 2 



Hence, for all x G [0, 1] and large enough n, 



-2 



/ q n (x) — q(x) . 



4ds 



(2-5) 



and we obtain 



1 



1 



<25- 2 \g n {X i )-g(X i )\ a.s. 



g n (Xi) g(Xi 

We also use the following lemma, whose proof is postponed to Appendix IA.2I 



Lemma 4. For large enough n, we have 



< c-j a.s. 



(26) 



(27) 



I Efc^ 

By returning to Equality fj23|) and combining with (|24p . (|25p . (|26p and (|27p . we obtain 



1 



\L(x) - f 2 (x)\ 2 < c 8 ( — J2\g n (Xi) - g(X, 



i=l 



A, 



x - Xi 



(Th \ ^ / Tl 



(28) 



x- Xi 
h 



a.s. 



We now successively control the expectations T±,T 2 and T3 of the three terms appearing in 
this upper-bound. For the first term, we have 



o.f 



1 

nh 



X>n(A,) - g(Xi 



Ai 



i=i 



x - X 



E 



nh 



n 



f 



Ai 



x - X\ /x - X 



-\~g n {X x )-g{X 1 )\ 2 Kl 



x-X 



^ — 1^ 
+ E ej 

n 



— \g n (Xi)- g(X 1 )\ \~g n {X 2 ) - g{X 2 )\ x 



A, 



x- Xi\ /x- X 2 < 
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Now, 



-11 



E 



i 



-\~g n (Xi)-9(X 1 )\ 2 K 2 1 



x-Xi 



o 



E ej {\g n -x(t) - g(t)\ 2 ) K 2 



2 /x-t\g(t) 



h J h 



-dt (according to definition (J9|) 



x-t\g(t) 



h J h 



-dt (according to Remark [T]) 



< Ci\n 2 P+ 1 (according to Lemma [2]), 
and in the same way 
1 



(29) 



T\2 —^ej 

»i ri 



-^\~g n {Xi) - g{X 1 )\\g n {X 2 ) - g{X 2 )\K l y 



x — X\ \ T ^ (x — X 2 



h 



E 6 

) jo 
n-2 



J 



n — 2 

g n - 2 (t) - g(t) + 



x 



n — 1 

g n -2(s) - g{s) + 



1 



(n-l)h V h 



&2 



n — V" " v ' ux ' {n — l)h V h 
This last term is upper-bound by 



x — t\ ( x — s 
K 



dtds. 



1 rl 



TX2 < 



E h 



Jo 



\g n -2(t) - g(t)\ + ——rg(t) + 1 

n — 1 (n — x)n 



K 2 



t - s 



x ( \g n -2{s) - g(s)\ + —rg(s) + - — -— 
n — x [n — x)n 



K 2 



s - t 



X 



K 



X — t\ ( x — s 



g(t)g(s) 



h 2 



dtds 



< 



ff 

>o Jo 



EjJ[|^_ 2 (t) - g{t)\ 2 ]E]']l\g n „ 2 {s) - g(s)\ 2 ] + 



nh 



x 



K 



x — t\ fx — s 
&x 



9(t)g(s) 



h 2 



dtds 



-2/3 

<Cx2n 2 P +1 



x — t 
h 



g(t) 



h 



dt 



(according to Remark [T) 



<Cx3n 2 P +1 (according to Lemma [2]). 
Thus we get that 



T x = Eg j 



1 



— "£\g n ( Xi )- g(X t )\ 



i=X 



Kx 



x - Xi 



-2P 

< CxAn 2 ^ 1 . 



(30) 



(31) 



For the second term in the right hand side of (|28p . we have 



\ i=i 



x - Xi 



< E 



1/2 
0,f 



\e-e\* 



E 



1/2 



x - Xj 



i=X 
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The proof of the following lemma is postponed to Appendix IA.3I 
Lemma 5. There exist some positive constant C15 such that 

r 



E e , f 

This lemma entails that 



V i=i 



h 



<C 15 . 



(32) 



r 2 < C15 



(33) 



Now, we turn to the third term in the right hand side of (|28p . We have 

2 / \ 21 



E(5 



J 



J 



1 n 

-3 J] (^(XO-^XOII^n^)-^] 



n 4 /i 2 



i ,j,k,l=l 

By using the same arguments as for obtaining (j29|) and (|30p . we can get that 

-28 

T 3 <C 1G n^+K (34) 
According to (|3ip . (J33j) and (|34p . we may conclude 



E ej (\f n (x)-f 2 (x)\ 2 )<C- 



15 



Ee,fl\0 



-2/3 



+ Ci 7 n 2 ' 3 + 1 . 



(35) 



By returning to Inequality (|2ip and combining it with (|22p and (|35p . we achieve that 

1 \e j(\§ - e\ 4 )] 3 +C 2 n^. 



E BJ (\f n (x)-f(x)\ 2 )<C 



□ 



6.2 Other proofs 



Proof of CorollaryUl We now consider the estimat o r 6 n in itially proposed bv lCelisse and Robin 
(120101) and further studied in iNguven and Matiasl (|2012l ) . From the proof of Theorem 3 in 
Nguven and Matiasl (120121 ). it may be easily seen that for any value 7 E (/3/(2/3 + 1), 1/2), 



n 



2-) 



E g>f I \9 



0. 



Thus, there exist some constant C > such that 



2f) 



sup E eJ (\f n (x) - f(x)\ 2 ) <CnWt 

.re [0,1] 



□ 
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Proof of Proposition QJ We control the difference 

l{f ) ~ l(f } = L 9o{x)log e + a-ewn*) x 



f 1 ( AT/ t+1 (z)i 

go (g) log 1 1 -u t (x) +uj t (x) jjytns \ dx - 



By the concavity of the logarithm function, we get that 
1(f) - l(f t+1 ) > j\v(x)u t (x)\og N ^^ dx 



> I g (x)uj t (x) 5*(log/ t+1 )(x)-5*(log/')(x) 



dx 



> 



f -if f f (u) \ 

go(x)uj t (x)( J K h (s-x)ds) yj K h [u - x) log duj dx 



Z 1 g (x)u t (x)K h (u -x) \ f t+1 {u) 



o 



> 1 r f + \u)\o^-^>du 



at Jo /*(«) 

> — £>(/ m | /*). 

at 

We now establish a lower bound on a^ 1 . As already mentioned, for all t > 0, the function 
/* is lower bounded by m. Since the operator A/" is increasing, it follows that N is also 
lower bounded by m. Now the function 

(1 - 9)x 

x H> 



+ (l-6)x 
is increasing, so that we finally obtain 

-i f 1 , s / x , A 1 (1- 0)NfHx) , . , (1 - 0)m 

This concludes the proof. □ 

Proof of Corollary^ We start by stating a lemma, whose proof is postponed to Appendix lA.4l 

Lemma 6. The function I : B —> R zs continuous with respect to the topology induced by 
uniform convergence on the set of functions defined on [0, 1]. 

Now, it is easy to see that I is a convex function. Moreover, since the densities in B are 
bounded (remember that any / € B satisfies m < /(•) < M/m), it comes that B C J- and 
since S is linear, the set B is convex. Existence and uniqueness of the minimum /* of I in 
B thus follows, as well as the simple convergence of the iterative sequence {/*}t>o to this 
unique minimum. □ 
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Proof of Proposition^ For all x, y £ [0, 1] and for all t, we have 



\f{x)-f\y)\ 



1 [Kh{u - x) - K h (u - y)}uj t (u) g {u) 



Jq uj t (u)g (u)du Jo Jo K h (s - u)ds 

< 1 f 1 \K h {u - x) - K h (u - y)\u] t {u)go{u) 



Jq uj t (u)g (u)du Jo rn 
L , 

< — \x — y\. 
m 

Furthermore, for all x G [0, 1] and for all t 

m < f\x) = I I' K h {u -x). t{ u )m{ u) du ^ M ? 



Jq 1 Lo t (u)g {u)du Jo /J K h (s - u)ds m 

so that the sequence {/*} is uniformly bounded and equicontinuous. By the Arzela-Ascoli 
theorem, there exists a subsequence {f tk } of {/'} which converges uniformly to some limit. 
However, this uniform limit must be the simple limit of the sequence, namely the minimum 
/* of /. Now, uniqueness of the uniform limit value of the sequence {/*}t>o entails its 
convergence. 

□ 
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A Proofs of technical lemmas 
A.l Proof of Lemma [3] 

Proof. We first show (|15p . According to the law of large numbers, since Kgj(^f(Xi)/g(Xi)^ 
1, we have 



n n 



-> 1. 



(36) 



Hence 



n 



n n— >oo 



By the dominated convergence theorem, there exists a constant C2 > such that for n large 
enough 

I i 1 _ r n 2 n . C2 



'n 

establishing ()15jl . Let us now prove (|14p . By using a Taylor's expansion, we have 



1 



1 



1 



1 

n 



2 - + ( — - 1 

n \n 



1 



l + 7n( 



Sn n 1 + (St - 1) 
where 7„ E]0, 1[ depends on 5 n . Combining this with (|36p . we obtain 

->■ 1. 



I)) 3 



U + 7n(4r-i)) 3 

Thus, there exist some positive constants c, d such that for n large enough, 



2 _Sn +( Jt^n_ l f 
n n 



1 1 
< — < - 

S n n L 



2_^i + c (^l_l) 2 

n n ' . 



a.s. 



(37) 



This implies in particular that 
In addition 



r 1 - 


1 r 




< - 


-S n - 


n - 



2 _^fM +cE [{ Sn 
n 1 n 



Efl,/ (— " 1) 
L n 



Var 



= -Var 
V n / n 



//(*i 



Sn 

l ' n 



D 1 



Remember that the ratio f /g is bounded (by 5 x ) and thus has finite variance. Hence, there 
exists a positive constant c\ such that for n large enough 



„ rl, 1 Cl 



We now prove (|16p . By using again a Taylor expansion, we have 



Sn + 5-i S n 1 + 1/(55™) S n 6S% [1 + (3 n /(5S n )] 2 ' 
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where (3 n G]0, 1[ depends on S n . We also have 



-> 1. 



[1 + (3 n /(5S n )] 2 u 
Thus, there exists a positive constant c" such that for n large enough 



E 



o.f 



1 



\-S n + 8- 



E, 



e.f 



1 



1 



1 



S n 5SI [1 + f3 n /(5S n )]< 





\ 1 " 




-c"i 


> Eg j 


-S n - 


- ®9,f 


q2 



a.s. 



According to (|37p . we have 



r 1 - 


1 




> 


- 5*71 - 


• 



n L n 



+ 

n 



and it is proved above that 



E, 



e.f 



S3, 



< 



<-2 



11- 



Thus we obtain Inequality (|16p . namely 



E 



e,f 



1 



S n + d- 



> 



1 c 3 



n n 



2 ' 



Finally, we show (|17p . In the same way as we proved (|16p above, we have for large enough 

n, 



E f , 



1 



S n + 28- 



1 C'o 

> 1 > 



and thus 



Ez 



1 



S n + 25- 



11- 



2c', 



J2 



n n* 



2c', 



n 



(38) 



According to Inequality (|37p (containing only positive terms for n large enough), we have 



C2 



1 

< — 



< 



77- 



Si 



4 + ^-1 



.Si, 



4 + ^ + 



5)i 



7? 



n V n ) n V n 



' S n 



,s n 



n 

'S 



Sn ( S n - N 2 

n \ n 



as 



4— +4c 

n v 77, 



Since 
E fl ,/[5 n ] 



77, 



E 0J [S*] = nVar 



+ n and Eg 



a.s. 



Sn 

n 



-Var 



/ 7(*iA 



we have 
1 



C'2 



< — 



< — 



< — 



77" 



E e ASJ 
4 "' /L nJ + 4cE ej 



n 

1 + — + c 2 E ej 
n 



77. 

^ _ i N 4 

77 



n 



4 H Var 

77 



77 



(39) 
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Combining (j38| and (|39p . we get that 



r 1 - 




i 




C2 


-S n + 28- 1 \ 



C c 2 
<— + —E ej 



n 



(40) 



We now upper-bound the quantity Egj[(^ — l) 4 ] . Let us denote by 



Ui 



f{Xi) 



We have 

5*n 



n 



HP)' 

1 n i n i n 1 n 

^E^E^ + ^E^+^ E 

i=i 

i n 
E w- 



Since the random variables J7,- are iid with mean zero, we obtain 



E 



o.f 



n 



± [nE e , f (Ut)+n(n-l)E ej (UiUi)] = o(-^,. 



(41) 



Finally, according to (|40p and (|4ip we have 

Eej 



A. 2 Proof of Lemma |4] 

Proof. We write 

1 1 



r 1 - 




1 








o2 
LD n- 


S n + 25- 1 - 



□ 



1 J2k( f k-n) 



1 + s E^ l -t^)~ ) 
Efc^-fc 



Let us establish that ||f — t||oo,[0,1] = su Pice[o,i] \^( x ) ~ T ( x )\ converges almost surely to zero. 
Indeed, 

f (x) - t(x) = {9 



g(x) \g(x) g n (x) 

and using the same argument as for establishing (|26p . we get that for n large enough and 
for all x £ [0, 1], 

\r(x) - r(x)\ < + 2| ^"~/ IU < 6^9 - 9\ + 25~\g n - g \U 
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By using consistency of 6 n and Remark [T] we obtain that ||f — tHooJq.i] converges almost 
surely to zero. Now, 



Vs G [0, 1] 



> 1 -s 



2||ffe - TfcHoojo,!] 



> 1 - - > a.s. 



We obtain that 
n 



< 



Efc^fel Efc r fc (Efc r fc) 



/7 1+ ,s>ft-^))-\ 

V Efc T k ) 



< 



n n r - r oo [o,i] 
H — x 



Efc^fc (Efc r fc) 2 







1--) ds 



2 8||r — 7-||oo,[o,i] . 
< H ^ttt — S C7 a.s. 



1 



(l-fl)S 



□ 



A. 3 Proof of Lemma [5] 

Proof. In order to prove (|32p , let us consider iid random variables U\ , . . . , U n defined as 



x - Xj 
h 



For all 1 < p < 4, we have 



We then write 



/ ^ (^7t) \ 9 ® dt = H j \ K i( t )\9( x + th ) dt ^ C ^h- 



1 n 



i=i 



X - X,; 



n 4 /i 4 



(42) 



where 

j2u i y=j2 u i+ y E u i u ^ + J2 u i u J+ E E tWiWi- 

And for all choice of the bandwidth /i > such that nh — > oo, 

i 

=nE 6J {U?) + n(n - l)E ej (UfU 2 ) + n(n - l)E e , / (£/ 2 [/ 2 2 )+ 

+ n(n - l)(n - 2)E e , / ([/ 1 2 [/ 2 t/ 3 ) + n(n - l)(n - 2)(n - 3)E e jiU^UsU^ 

=nE ej (U?) + ?i(n - l)E e , / (C/ 1 3 ) Eej/ ( f / 1 ) + „( n _ l)E 2 i/ ([/ 1 2 )+ 

+ n(n - l)(n - 2)E , / (^ 1 2 )E^ / (C/i) + n(n - l)(n - 2)(n - 3)E| i<f (£7i) 

<Ci 5 n 4 /i 4 . (43) 
According to (|42p and (|43p we obtain the result. □ 
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A. 4 Proof of Lemma [6] 



Proof. Let / be a function in B and {f n } be a sequence of densities on [0, 1] such that 
\\fn ~ f\\oo > 0. Let us recall that every / 6 B satisfies the bounds m < f < M/m. We 



have 



l(fn)-Kf) 



1 e + (l-9)Mf(x, 
g {x) log - „ N . r „ , , dx 



< 



(Jo 



(x) logj 



+ (i - e)Mf n (x) 

l-9)[Mf n (x)-Mf(x)] 



1 + 



e + (i - e)Mf n {x) 



dx, 



and 



jVf n (x)-Nf{x)\ = Aff(x) 



exp 



fo K h (u - a;) [log f n (u) - log f[u)]du 
Jq 1 ^(s - x)ds 



M 
< — 

m 



exp 



L K h (u - x)[log / n (u) - log f{u)]du 



J K h (s - x)ds 



For |x| < e small enough, we have | log(l + x)\ < 2\x\ and | exp(x) — 1| < 2\x\. Combining 
with the fact that / is bounded, we get that 



K h (u - a;) [log f n (u) - log f(u)]du \ < 

< 2||/„-/|| 

and thus 



K h (u-x) log{l + — ) 

f{u) 



du 



AM 

Wfn-Arf\\oo<—\\fn-f\\c 



We finally obtain 



\l(fn)-l(f)\<C\\f n -fl 

where C is a constant depending on h, K and 9. 



□ 
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